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Tos... Distribution 


From? Te He Van Vieck, Se He Webber, A. Bensoussan. 
Date? 98/07/74 


Sub jec ft? Implementation of Proposed New Sects. Syetens 


This memorandum presents the implementation choices for the 
proposed new Multics Storage System described in MTB-055. 


The reader is assumed to be generally famifiar with the 
operation of the current Multics Storage Systen. ; 


REVIEW 


Five problems with the current Storage System were 
cents Teed Aa Reese and five goals proposed. These were? 


ide PROBLEM: The Storage System foses information. 
GOAL: Eliminate fioss of information by reducing the nusber 
of crashes, by itimiting the camage done by crashes, ana by 
minimizing loss of information during recovery procedures. | 


Ze PROBLEM: Backup and recovery procedures cost too much. 
GOAL? Minimize system cown time and devote fewer resources 
to backup functions. 


3. PROBLEM: Large amounts of storage cannot be handied. 
GOAL: Make extremety targe storasce configurations usabie 
without imposing a Brean in performance, reliability, or 
availability. | , 


ye PROBLEM! Several desirabie features should be added, 
including support for removable disk packs. 
GOAL: Add support for removabie disk packs and other 
features. . 


5. PROBLEM: The operator interface is deficient. 
GOAL: Improve the operator interface, especialiy in the 
areas of shrinking and expanding the device complement, 
operating a crippled system, and providing recovery 
information. 


Muitics Project internal working documentation. Not fo be 
reproduced or distributed outside the Multics Projecf. 
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Overview of the Proposal 


The physicait storage availabie on a Multics configuration 
wil! be grouped into partitions, as it is now. The NULT 
partition will be further subdivided Into logica! volumes» which 
may consist of part of a physicai volume, or several ohysical 
volumes. Ail physical volumes which coaprise a togical voiume 
must be mounted or dismounted at the same time, so that a fogica! 
volume may not be partiality mounted; but logical volumes can be 
added toe or remceved from the MULT partition while the systes is 
runninge A physical volume may contain storage for only one 
togical voltume}3 the reason for allowing “fractional” physical 
volumes is to accomodate the DUMP, LOG, SALV and 80S partitions 
without requiring that the minimum system configuration have two 
volumes. 
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Figure 1! Exampie of Physical Votume Usage 


Every segment in the new hierarchy will. Imave all’ ifs pages 


atlocated on the same physical. volume. _ TA segments in the same 


directory will be ‘contained in the ‘Same fogicai volume. 


Tcieiiaen Goes lth Deel ant 


Each physica! volume has a label, recorded by.a speciaf BOS 
utifity, which describes the storage extents on the volume and 
the name of the iogicai voiume it provides storage for. A 
physical voiume is part of only one logical volume. Each 
physicai volume has a volume unique identifer (VOLUID), used by 
the system to identify the volume. 


Each physical volume in the new Storage System has a Volume 
Table of Contents {(VTOC) which contains an entry for every 
segment on the volume. The VTOC entries contain the information, 
formeriy resent in the directory oranch, which describes the 
physical storage occupied by the segment. All pages of a segment 
will reside on the same volume. Each volume also contains a 
Volume Map, which has an entry for each page on tne volume 
describing its current status. 


There is no FSDCT in The new Storage Systen. The Volume 
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Maps and the configuration deck provide the information which 
usec to be contained in the FSDCT. Volumes tisted in the 
configuration deck are called permanent volumess and cannot be 
dismounted. Ali directories must reside on one permanent iogical 
volume designated in the configuration deck; no <airectory may 
ever be off iine. The togicai volume swhicn contains the roct may 
consist of severai  physicai volumes}; and a configuration need 
have oniy this one logical voitume defined. 


The directory branch and the VTOC entry for a segment are 
connected by a YVIOQC pointer storec in the branch. This pointer 
is a pair of 36-bit quantities which specify the VOLUID and “the 
focation within the VTOC where the VTOC entry resices. The VTOC 
pointer*s second component, the VIOC index, is only interpreted 
within the context of the specifiec voiume. Both the branch and 
the vToac entry contain the unique ID of the segment, and both 
unique ID*s must match if the system is to consider the 
association vatid. 


The system will maintain a tabie in wired-down storage knoxn 
as the Device Tabie, which has one entry per disk drive in the 
configuration, specifying the VOLUID for the volume mounted on 
the crive, the DIM parameters necessary fo run the darive, and 
other datae 


The system wil! siso have a more extensive ring 1 data base 
which registers each fogical volume known at the iinstaltation, 
and tists the physical volumes invoived, the volume owner, and 
provides an access control fist for mounting control. 
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DESION SIBRATESIES 


The focus during the initial design of the new Storage 
Systen wiil be on getting a4 version which runs and is 
functionaily correct, in as short a time as possibile. Adequate 
system performance is aise an important functional requirement. 
But some functional extensions will be postponed to fater ohnases 
of system development in orcer to get the new system on the air 
quickly. 


Data Bases 


. Wherever possible, data bases wit! be modified in a 
compatible fashions leaving previously-cefined items where fhey 
weree For example, the directory branchn need not change size? 
the removai of file maps wil! be compensated for by the addition 
of a VTOC pointer, but no attempt wilf be made to (say) 
re-structure ACL*°s. 


Supervisor Calis 


It is hoped that all current supervisor calts wil! continue 
to function exactiy as they do now, with one or two exceptions. 
A few new entries will be addec, and one or two new status codes 
may be possibte (for example, “iogical volume not on-!ine™). 


Aigorithnms 

Straightforward code wili he much easier to debug and 
maintain, so our preference will be to implement the new Storage 
System with less mechanism rather fhan more. This is especiaitly 
true for the first phase of the Laplementation. 


For instance, the current system nas a fairly compticated 
mechanism for ailocating variabie-sized file maps in the 
directory. When file maps are moved to the VTOC, this stratesy 
wilt be eliminated, and each VTOC entry wiil contain space for 
the maximum-size file map. . 


security Considerations 


The additional security controls described in NTB8-086 wiil 
be supported by fhe new Storage System. Care wilt be tseken to 
insure that no new ways of communicating between users of 
different access authorizations are introcuced. 


In order to prevent unauthorized communication between users 
with different attributes, by means of quota manipulaticn or 
signalling by mounting and dismounting, logical volumes which can 
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be dismounted may contain segments from onty one sensitivity 
fevel and category set. The ievel, catesory set, and minimum 
ring nusber for each dismountabie vofiume is kept in the on-tine 
fogicat volume registration aata, and ls atso recordeaq in the 
volume labels. 


To guard against accidental disctosure of information in the 
event of ae system failure, the nen Storage System takes 
considerable care to avoid re-used addresses. Aliso, ali free 
pages on disk are explicitly reauired to be zero, so that even if 
an unused page is mistakenly added to a segment as a result of a 
system crash or a dropped bit, the system ailfl not compronise 
security. 


Sizes of Data Fields 


One prodlem the new Storage System solves is that of 
providing for much more  physicai storage in a Muitics 
configuration than the current supervisor can handle. Part of 
the current probiem arises because we wish to support future 
hardware enhancements which may provide storage devices with. much 
farger capacitye The recent change to the whole Storage System | 
and to BGS needed to support DSU-191°s was able to find ae free 
bit: the next such change would require restructuring of many 
system data bases. 


For this reason the disk record address is being changed in 
format. The current address is 


device ID bit (4) 
device address bit (18) 


where the device ID, ranging from i to 7 (0 is used for null 
addresses, and the high-order bit indicates that the page is on a 
specilat device, e.g. the paging device), specifies the storage 
subsystem (DSU-191, etc.) according to a tabie in the FSDCT. 


The new address format expands from 21 bits to 36. It ftooks 
jike this: 


device table index bit (48) 
record address - pit (18) 


The old “device address” coded toth disk drive number and address 
on the gisk pack into 18 bits; this has been changed so that we 
have an effective width of 36 bits, with the device table index 
usec to select the proper disk drive and the record address being 
strictly an offset within the volume. The “device [0" is tacated 
in tne device table, and selects the strategy and coding scheme 
used to run the device. The new address format provides for up 
to 256K devices on-line, each device having a capacity of 256K 
recordse Since the current capacity of a DSU-1391 pack is about 
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20-000 records, we sre prepéred to support a tenfold increase in 
the capacity of a single pack$3; if cevices with more capacity are 
procuced, we can define several logical volumes on one physical 
volumee The total amount of storage which. can be supported by 
the system increases by a factor of 32K, to about 281 quadrillion 
characters. 


The disk record address is never interpreted except in the 
context of its own volume. Different volume-addressing schemes 
and VTSC jayouts couidc exist compatidiy within the same 
configuration on different volumese 


The “VTOC index” stored in a branch is useaq at segment 
activation time to iocate the correct VTOC entry on the volume. 
Like the disk record address, this number can be interporeted anly 
in the context of the volume if refers to. The VTOC incex might 
be coded as a record address on the volume pilus an offset, or as 
a subscript in a fixed-length array, or as some sort of hash 
address into the VTOC. Making this field 36 bits wide insures 
that whatever clever coding scheme is used will nave enough bits 
available. 


Compand Changes 


The list and status commands should have options to list the 
logicat volume on which a seoment resides, and to indicate 
whether a segment is on-line. Some redefinition of the items 
printed by the defauit invocation of the list command would be a 
good idea, to insure that the command wiit reference only the 
directory untess the user expliciltiy requests otherwise. 


An active function “on_line™ would be useful for checking 
whether a segment is currentiy on-line. 


A new command “set_vol” ana an option to status to return 
the fogical voiume ID will be needed to handle the volume ID 
associated with each directory. 


New commands are needed to request the mounting and 
dismounting of volumes. 


New Supervisor Entries 


-A new hes_ entry must be provided, or status. modified, to 
return the name of the logical volume on which a segment resides. 
New calis are aiso necessary to set and get the logical volume ID 
associated with a directory. 
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ALGORITHMS | 


This section describes the sequences of operations performed 
by the new Storage System for various system functions. Each 


function is describes in terms of its differences from the 
current Storage System function: 


Branch Creation 


When append is called to create a new branch, it must 
determine the correct volume for the storage associated with the 
branch anc allocate a VIOC entry on fhat votume. The VTOoCc 
pointer to the new VTOC entry is then stored in the branch. 


In order to create a secment, a user must have append 
permission. on the parent directory and meet the usual valiaation 
level end security level constraints. 


To aetermine the volume, append obtains the togica!l votume 
name from the directory header. If more than one physical volume 
is a member of the togical volume, the physical volume witr the 
most free space is chosen to receive the new segment (uniess the 
volume has a swifch set which makes it appear “full,” as may 
happen when a iogicai volume is being compressed). 


Once the voltume is cetermined, append iocks the directory 
and altocates the branch as it coes now. Nexts append catis the 
VTOC_manager to request the creation of a VTOC entry on the 
appropriate volume. If the VTOC for the chosen volume is full, 
append returns an error code ana coes not create the branch. 


The VTCC entry is initiatizec by the VTOC manager when the 
entry is allocated. Once a VIGC entry has been allocated, 
modifications to tne VIGO entry are adequately protectec by the 
parent directory tock. 


Making a Segment Known 


Making a segment known does not require a reference to. its 
VTOC entry. 


negment Fauit 

The system"s processing of 4 segment fault has two parts? 
first, the supervisor cetermines whether the seament faulted oan 
is active. If so, it is only necessary to connect the SOW for 
the segment to the page tabie and return. If fhe segment faulted 
on is not active, it must be activated. To determine whether a 


segment is active, the supervisor wiil obtain fhe unique I0 of 
the segment from the KST entry and search the AST for the 
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Activation 


To activate a segment the supervisor obtains the parent 
directory’s segment number and the focation of the segment‘*s 
branch from the KST, tocks the parent directory, and then calls 
VTOC_manager to cause the VTOC entry for the segment to be read 
inte awirec buffer. Next, the system tocks tne AST. and obtains 
an ASTE of the appropriate size. (This step may cause some other 
segment to be deactivated.) The ASTE is filied in from the vToc 
entry and the branch. 


When the page tabfe is being filled Ins the system may 
encounter null addresses in the file map. These are represented 
as PTW*s with a “null address” fiag on, which the system will 
check at page fault time. 


Page Fault 

When a process encounters a page faults the supervisor 
checks the PTW being faulted on to see if the “nuil address” flag 
is one If nots the disk record address from the PTW, together 


with the Device Table index in the AST entry, is used to generate 
a disk address for the device I/O. 


If the supervisor encounters a fault on a PTW with the “null 
address" fiag, the supervisor will give the segment a block of 
zeroed core. . | 


When a page is being written out, the supervisor will 
examine the page to see if it is ali zero. When the current 
supervisor detects fhis situation, it dces not write the page, 
but simspty frees the disk address. This behavlor is thought to 
be the cause of many of the re-used address problems which the 
current system encounters. In the new Storage System, zero pages 
wilt be written back to disk, and a “zero page" fiag set in the 
PTW. Pages with this flag on wilt be freed at deactivation time, 
that is, when the VTO0C is updatec to reftect the fact that the 
record is no tonger being used by this segment. This stratecy 
insures that ali free records on disk are zero, so that damage to 
a disk pack is much tess likely to introduce old information into 
a file. 


When a page is to be written out, if the “null adaress” flag 
is on, a disk record will be assigned on the appropriate volume. 
If the page is alt zeroes, of course, fhis step can be 
eliminated. 


MUL TICS TECHNICAL BULLETIN © Page 9 


Boyngs Fault 


Bounds faults will be great!y simplified in the new Storace 
System. Since ail file maps are full size, there is never a need 
to re-aliocate a file map in the middie of bound fault 
processing. This means that a bound fault neeac onty re-atiocate 
AST entries. Since the ASTEP has been removed from the branch, a 
bound fault does not need to modify the branch, and therefore the 
directory need not be locked. 


Deactivation 


When a segment is deactivated, any cisk records 
corresponding to PTW*s with the “zero page“ flag witli be 
released. The data in tne ASTE are written back to the VTOC by 
VTOC_manager, which does not lock the parent directory. A system 
performance improvement can be expected for deactivation since 
the branch need not be referenced? this change eliminates the 
paging in and writing out of the directory page(s) for the 
branch. In additions the references fo the directory header page 
for the locking and unlocking operations on the parent directory 
are eliminated. 


Making a Segment Unknown 

No change is mace to makeunknown. 
Iruncation 

When a segment is truncatec, the pages of the segment will 
be expficitiy zeroed and the zero pages written out: to disk. 


Deletion 


When a segment is deleted, it is first truncated, to insure 
that the disk pages it occupies are zeroed. The sequence for 
deleting the segment is? 


fock directory 

delete branch 

calt vToC_manager to delete VTOC entry 
untock directory 


It may be possible _ to unlock the directory before caifing 
YTOC_msnager. 
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Directory Modifications 

when the system crashes while a directory is being modifled, 
the salvager frequentiy tinas the directory In an inconsistent 
state. It is possibie for the salvager to do a great deal of 
damace to the hierarchy in the attemnt to correct the directory. 
What seems to happen is that a directory is tocked, and the 
supervisor starts making some change in the directory, for 
exasmpie adding a new ACL entry, ana gets far enough in this 
operat ion so that the directory is inconsistent when the 
directory page is removed from core In the normal course of core 
management. If system operétion is then interruptec, the page of 
the directory which is on disk must be repaired by the salveéeger 
before the directory can be usede Core pages which are flushed 
to disk by emergency_shutdown may aiso lead to this situation. 
In the current storage system, if emergency_shutdown succeeds, 
all) potentially inconsistent directorles can be detected oy 
examination of the ASTEP in the branch and the fock in the 
directory header. The new storage system eliminates these items, 
and might appear to make the job of the salvager more difficult. 


It would be far better from the point of view of reliability 
if the inconsistent directory pages were never written to disk. 
This might occasionailly cause operations which the user thought 
had completed just before a crash to be lost, but if would mean 
that for aimost aii system crashes, no saivaging of the directory 
structure was necessarye 


To accomplish fhis goal, the operation of tocking a 
directory wilt set switches ronored by page control! which will 
prevent any pages of a directory which is locked from being 
written out to diske (The pages may be claimed if they have not 
been modified, and if a paging device is availabie the pages may 
be moved to the paging device.) Tnese switches must be respected 
by pace confrot and emergency shutdown. {If pages of iocked 
directories may go to the paging aevice then the saivager must 
respect such a switch in the pacing device map too.) 


Paging Device Management 


No significant changes sre planned to paging device 
management. 


Yoivme Mounting 


When a user wishes to requesf the mounting cf aie togical 
volume, the pattern works somewhat like that proposed for tape. 
The request is validated by ring i and passed to the system 
contro! process, where a message is typed to the operator. When 
the operator has mounted the volume, he Issues 2a command to 
inform the system that the physical volume is mountea.e The 
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supervisor will transiate requests for the mounting of a logical 
volume into muftiple requests fer the mounting of physical 
volumes, if necessary. 


Registration information, including an access contro] {iste 
wilt! be mainteined for each fogical volume, in a ring i data 
base. This information will inciude the tist of physical 
volumes, owner identifications and access control anc security 
information. 


Volumes are connected to the Storage System by the 
supervisor either af system initiatization or in response to a 
user mount requesf call passed from ring 1.2 After verifying that 
the drive is ready and that the volume tabel, VTCC, and Volume 
Map are correct ana self-consistent, the system makes an entry in 
the Device Table showing that the volume is on-line. 


Before a connection Is made, each VTOC entry Is vaildated 
tits current segment tength and number of pages must agree with 
the file map), and the VIOC file maps are then checked against 
the Yotume Map. If a file map adcress from a VTOC entry points 
to a Gdisk record marked free in the Volume Map, the record will 
be marked as used. If two file map addresses point to the same 
VYVotume Map entry, the Volume Map and both VTOC entries will have 
tne record freed, and the record wil! be zeroed. 


VYotume Dismounting 


Volume dismounting may be the resu!ft of an explicit request 
by the user who mounted a volume or the dismount request may be. 
issued by a privileced process. 


Ths supervisor must nof allow the dismounting of a volume if 
there are stiii pages in core or en the paging device which have 
not yet been written to the cevice. Each volure will have a 
switch which prevents any more activations. A program similar to 
shutdown can then set the switch and toop through the AST 
deactivating segments on a volume which Is to be dismounted. 


Once a volume has had al! Its segments deactivated end aii 
pages fiusned, it is safe to dismount it. Any known secments 
which have been dismountea will cause seg _fault_error conditions 
if they are referenced. 
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File System Initialization 


The program initialize_dims is calied to start up the 
Storage System. Its first step is to read the CONFIG deck ard 
Initialize the disk DIM*s for each aisk type Jlisted on a = PRPH 
carce It then reads the INTK card, determines the correct 
partitions and locates the PART cara for the partition. The PART 
care lists the logical volumes which are in the partition; the 
first togicai voiume listed must contain the root. Tha Iogical 
volumes are described by VOL carcs which teil which physical 
units contain the physical storage for the fogical volume. Each 
volume is connected to the system, starting with the root volume. 


The root volume contains a pointer in the volume tfabe! to 
the VToC entry for the root directory. It may be possible to 
have a special segment which contains a root branch, in order to 
eliminate various pieces of complication in directory control. 


Disk Record Assignment 


When the system attempts to write auf a page which has fre 
“null address” fiag on, the supervisor wil! assign a disk record 
on the volume where the segment resides. For each volume 
connected to the system, the supervisor keeps a pool of “free 
addresses in wired-down core. As record addresses are neeced, 
they are withdrawn from the poote. If a pool becomes empty, the 
supervisor replenishes it by reading in a section of the Volume 
Map and noting fhe free addresses. The pcol is aiso adaed to by 
pages reteased at truncation anc deietion, and zero pages freed 
at deactivation. . 


Since page faults cannot claim very many pages a second 
without exhausting the system"s free space, the number of fimes 
that the Voiume Map must be consulted should be f!ow when the 
system is in steady state. 


Access to the yIoOc 


The VTOC manager is a new program which wil! be responsible 
for all accesses to the VTOC entries. It will have wired core 
buffers of its own, and will access the VTOC by a special I/0 
faciiity which wlft i use 64-word disk reads and writes instead or 
1024-nword I/0. 


When a request to read a VTOC entry is mace, the VTOC 
manager will first search the AST to see if the segment is 
active, and if so witli reconstruct the VTOC entry from the ASTE 
and return the VTOC entry to the calier. If the segment is net 
active, the core buffers will be checked to see if the VTOC entry 
Is recentty used and still in core. If not, a disk I/0 requesf 
will be issuea for the VTOC entry$ it will be read into ae free 
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core buffer if one is avaliable. If no buffer is available, the 
oldest puffer wiil first be written to disk (if modified) and 
then the read performed. Each ouffer will have a “modified bit" 
so that a VTOC entry need not be rewritten untess it has changed. 


Tne contents of the vTcc entry can be completely 
reconstructed from the AST entry, so that when a segment Is 
active, we can assume that the onty copy of the VTOC entry exists 
in the AST. This property allows us to write VTOC entries out to 
disk from the AST entry data without having to first read in the 
YVTOC entry in order fo update if. 


Although the use of 64-word I/0 in addition to the standard 
1024-word I/0 used by the paging sechanism adds some code and 
some complexity to the supervisor, if provides several importart 
advantages for the management of VTOC Information. Obviousiys 
the use of smail “pages” for VTOC entries cuts down on the amount 
of disk channe! busy time end memory load, for any given rate of 
access to the VTOC. The amount of wired-down storage needed to 
buffer VTOC entries in core Is decreased. But the most important 
effect is to eliminate the unnecessary transportation af YTOC 
entries and file maps for segments which are not being activated. 
Qur experience to date suggests that data are most often 
destroyed when they are in core, or when they are transported to 
and from core. Using 64-wordg I/0 makes if more fiikelty trat a 
system crash witli destroy onty segments which were actualiy in 
use at the time of the crash. 7 : 


Locking 
The ftocking hierarchy tooks tike this?’ 


directory tock 

parent directory iock 
root directory tock 

AST lock 

VTCC manager iock 

page control giobal lock 
traffic control tock 


The VTOC manager tock and the page contro! giobal lock will be on 
the same tevei - that is, there Is never an attempt to tock both 
of these at once. 


Carrying Packs Between Sites 


Carrying packs between sites will be tricky. The YVITOC 
entries are valid, but what must be done is to construct branches 
for each VTOC entry, and fiil In VTOC pointers and vaiid unique 
IO0*s. This operation must be privileged and done carefully. The 
VOLUIDB must aiso be changed, since different installations may 
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have assigneca the same VOLUID to clifferent packs. 


One way to construct the new branches to describe the 
contents of a volume which has been imported into a site woulda be 
to require the user to run a program which looks much like 


backue dump, which would wrife a smaii Tape containing the 
directory information only for ali segments on the volumee Then 
the user would carry both a pack and a small tape reel, A 


convention cculd be estabiished to permit the supervisor to piace 
the contents of this tape on the pack itself. 


The ability to carry packs between sites will not be part of 
the initial imptementatione 


Quota 


The quota mechanism will have the same basic elements as the 
current schemes: that iss al! segments in the same cirectory will 
be charged to the same “quota celi™, consisting of 


maximum recoras used 
current records used 
time-record product 
time fjast updated 


Since all nonedirectory segments in the same cirectory must 
reside on the same togical volumes one quota cell per directory 
is sufficient. It wilt! be stored in the VTOC entry and the 
branch for the directory when the directory is not active, or In 
the AST entry when The directory is activated. The storage for 
pages of directories themsefves is atways on the logical volume 
which contains the roote In order to prevent any user from 
monopolizing storage on the root voiume, each directory will 
actuaify nave two quota celis? one for directory pages orly, and 
the other for pages of non-directory segments. As in the current 
system, there will be a vaiue of auota which means that there is 
no ftimit or the storage in this directory, but that some 
Nigher-level directory’s timit musf be checked. 


MULTICS TECHNICAL BULLETIN | 7 Page 15 


Se Ee ee aE a aes OE ees ere mee Ae 


H 
i 


1 BRANCH ? i? YVTOCE i 7 ASTE } 
! : 3 a H 
? De-Quota i { D-Usage i i Decell H 
? NO-Quotai 3 ND-Usagei { NDe-cell ! 
ftv TD : ¢§ D-tro $: ¢: LY Td H 
3 2 $ NDetrp 7? f H 
$$ ?: Detime ! 4} H 
H 2 ¢$ NO-time 3 H } 
: ; 

3 ' 3 


—_ 


Figure 23 Quota information 
for Each Directory 


Within this framework, it is possible (but not necessary) to 
make a major improvement to the current quota mechanism which 
should provide users with significantiy more flexibility in 
controtiing their disk usage. Currentiys, when the Storage System 
finds a quota which is nonzero, the storage for fhe secmenfs 
inferior to the directory with the quota is “charged” to the 
quota, and no further checks are rade. In the new scheme, the 
Storage System then continues up the hierarchys as long as the 
logical volume identification matches, checking avuota at each 
fevel. There is no moveguota operations and quotas may be set 
freely at any fevel by any user with modify permission on a 
directory. 


An exampie will! make the use of this facility ctear. 
Suppose that a project is fo be given a& maximum ocuota of 138 
recordse The system administrator sets the project directory’s 
quota to 100. Now, suppose fhere are if users on the projecte 
The project administrator may then set a quota of 20 on each user. 
home direcfory. Any user may use up fo 290 records of storage, 
provided that fhe project°s total usage does not exceed 1086 
recordse Thus, a user coulda possibie encounter severai different 
record quota overflows: either from his nome directory, or from 
the project directory, or from higher directories. In oractice,s 
the quotas for the root and for >udd will be set to “infinite”. 
Qniy the quota ceii nearest to the segment will accumulate a 
time-record product. 


Since the chain is broken when a directory with storage on a 
different togical volume is encountered, the use of dismountab!e 
volumes does not affect the normai quota mechanism on = system 
storage. ; 
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Repsirs to the Hierarchy 


When system operation is interrupted abruptly, the Storage 
System data bases may have been left in an Inconsistent state. 
We have attempted to eliminate states which are inconsisterts, or 
to minimize tne amount of time the system spends in these states. 
Procecures which verify that the hierarchy is correct and repair 
it if necessary will continue to be neededs though, because 
hardware and software errors can occur ahich violate any of our 
assumptions. 


There are four repair operations which must be worked cut?! 
emergency shutdown, pack Salvage, tree salvagesy and tree-VTCC 
saivage. 


EMERGENCY SHUTDOWN 


The emergency shutdown mechanism will work about the same as 
it does now. When the system crashes and ESD is invoked, an 
attempt will) be made first to upcate the Volume Map on each 
mountec volume. If this operation succeeds, an attempt will be 
made to fiush ouf all core pages, and then to deactivate ail 
segrents (anc update the VIFCC"s). 


PACK SALVAGING 


This operation Is performed whenever a volume is connected 
to the system; if should take only a few seconds. It consists 
of reading through the YVTCC for the volume and examining each 
file map, and checking the Volume Map entry for each page in the 
file wap to make sure that the page is recorded as being used by 
the VTOC entry and is pointed to by oniy one file map address. 
In ail! salvaging operations, it is not necessary to use 64-nord 
1/0$ and the use of virtual I/0 will make the code clearer and 
more obvious. 


In “Jong salvage" mode each disk record which is marked 
allocated in the file map can be checked to see if it is zero, 
and if so, the recore can be reteased from the file map and the 
YVTOC entry adjusted. Free records can be checked to make sur2_ 
that they are zero. If a record which should be zero is found 
nonzero, the data cannot be restored fo its rightful owner’ burt 
such a fina is evidence thet-the system has probably lost some 
datae | 


TREE SALVAGING 
This operation is like the current saivager. Starting with 


the root, the directory hierarchy is scanned and each entry is 
checked for vatiditye Olrectory hash tables, <ACL*S, etce are 


MULTICS TECHNICAL BULLETIN Page 17 


rebulit if necessary. If a branch cannot be made valid, the 
branch is deteted. If a directory cannot be made valid, the 
directory’s branch is geleted and the directory segment deleted. 


TREE-VTOC SALVAGING 


This operation is done after volume and tree safvaging. The 
directory hierarchy is walked and for each branch, the VTOC entry 
is tocated and the UID match checked. Then, the VTOC for the 
system voiume is scanned, and VTOC entries not visited during the. 
tree walk are examined. For each such entry, there is no valid 
branchs An attempt is made to construct such a branch by 
examining the UID pathname in the YTOC extension: this may pcint 
to 2@ valid branch in a directory which has become detached from 
the root by accidents or it may point to garbage. The satvager 
folttows the parent UIDs back until it finds the break in the 
hierarchy, and constructs a new branch for the segment or subtree 
in “>fost_and_founc”™. Since the Y¥TOC extension contains the 
primary entry name for the branch, we may even be able to 
rebulld the branch in the correct place. 


In “ftong™ moce all mounted volumes are processed. then 
speed is important, the saivager can check oniy the system 
volume. If this operation is fast enough, we will co it every 
Time we boot the system. 


Backup 


Compiete and catchup dumnos can be replaced by physica! dumos 
(iike the current BOS SAVE) for backup of most of the system. A 
retriever can be written which will retrieve a secqment from one 
of these tapes given a YTOC Index. It may even be possibie to 
run these dumps withouf shutting down, if users can accept the 5 
minutes* wait which would be required for the satisfaction of a 
segment fault encountered while a pack was belng dumped. 


An option to allow a user of a privéte dismountablie pack to 
request that his volume be dumped to tape would be desirabie} 
this might be an offtine utility requeste 


The directory structure of the system can be backed up by a 
"skeleton dump™ similar to the current dump programs, but which 
dumps onfy directory data, not segmentse Incrementa! backup 
dumps can be run if the instailation wishes to provide protection 
against the accidental deletion of segments. , 


The Storage System will have an option to cause specified 
volumes to be written in dupticate on more than one physical 
drive. A moderately iarge installation can cause a dupticate 
copy of the root ftogical voitume to be kept, and the system will 
then be protected from disk catastrophes involving the directory 
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hierarchy. A later improvement might involve allowing the system 
to automatically switch to the backup copy if the primary copy 
wenf bad; such a proposal can be made later if experience stows 
that the facility is useful. 


Device Reservation 


Because there will be very tittie incentive for a user to 
dismount a volume, unless the instaliation sets a very high price 
for use of a dismountablie volume, and because many users may be 
using a volume*s contents xhen it Is mounted, most instatlations 
wilt wish to establish some sort of schedule for permission to 
use the disk drives which are available for user mounting. An 
automatic deviceereservation system fo handie fhe scheduted 
forced dismount of vofumes on these drives and permission to 
mount new volumes will be a necessity. 


The interaction of this facility with the Access Isolation 
Mechanism must be considered carefully. 


Pack Initialization 
«A BOS utility must be written to initialize a volume for use 
with the new storage systen.e. This utility must be abie to label 


volumes and build VTCCs and Volume Maps. It should be abie to 
zero an entire volume as well. 


Error Recovery 


Several improvements are planned for the disk DIMs so that 
when a disk drive or pack goes bac, the system will attempt to 
keep running. One consequence of this desire is that the 
supervisor will attempt to discover when it has fyped out, says 
ten disk error messages for the same disk address or address 
range, and automatically suspend use of this part of the disk 
untiif made to start again. (Cf course, this cannot be done for 
the root volume.) Woving packs from one spindie to another is a 
dangerous activity, especlaifty when disk errors are occurring; 
but sometimes this wili cure disk problems, and it woutca be nice 
to have the system well enough organized so that such a swap 
could be made without crashing cr shufting down. A special 
interrupt or an operator command could be used to tell the system 
to start retrying Its I/0 after the operator had attemptec to 
correct the orobiem. 
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Qperator Commands 
The fotiowing operator commands must be provided? 


reply to disk mount message . 
iist mounted disks 


Privilegeq Commands 


The following commands must be provided for 
procrammer and system administrator uses: 


fist mounted disks 

fist device tabie 

{ist device reservations 
force device dismount 
force online pack salvage 
force device reservation 
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DATA BASES 


This section describes the format of various system data 
bases. 


Configuration Cards 


The folfoning is an exampie cf the configuration deck for 
the Ben teres Systems 


INTK 6 MULT : 
PART MULT V1 V2 NB V4 VS 
PART SALV V6 


VOL Vi SRV DISK 0 0 404. 
VOL Vi SRV DISK 4 0 404. 
~WOL V2 STO DISK 2 0 404. 
VOL V3 STO DISK 3 9 404. 
VOL V4 STO D18 0 0 202. 
VOL V5 SCR DISK 4 0 262. 
VOL V6 SAL DISK 4 202. 202. 


PRPH DISK A 23 191. (disk BIM info) 
PRPH Di8 A 25 184+ (disk DIM Info) 


The INTK card felis the system what partitior to use. The 
PART cards define which volumes méke up the partiticn. if more 
than 13 volumes are in a partition, additions! PART cards with 
the same partition name may be supplied. 


The VOL cards name the logical voiumes,s ana specify their 
device type and tocation.e. In the exampte, “SRY” and “STO" are 
flags which describe the use to be made of the volume, and “DISK” . 
and “D018” are (fogical) device types which will be tooked up in 
PRPH cards. The other parameters on the VOL card are pairs of 
<first-record, n-records> expressed in cylinders. 


Directory Branch 


Several data items now storec in the branch for a segment 
witl be moved to the VTOC entry associated with the branch. 
There is a one-for-one correspondence between branches and VTOC 
entries, and the directory iock protects the VTOC entries as well 
as the directory branches. 


The directory information reiating to the togical 
orgenization of the data represented in the Storage System will 
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be stored in the dbranch$ the information about the physical 
storage will reside in the VTOC for the volume containing the 
storage. 


In particutar, the following items will be removec from the 
branché | 


file map 

device I0 
date/time modified 
date/time used 
current iength 
records used 

AST entry pointer 


Most of these items will be moved to the VTOC entry, except (for 
the ASTEP, which is eliminated. Instesd of Inspecting the 
directory to see if a segment is active the supervisor riil 
search the AST for the unique ID of the segment. {A hash table 
for the AST may be imptemented to make this fast.) 


In order to enable the supervisor to find the vToC entry 
given a branch, the directory entry wili have a new item adced? 
a VTOC pointer stored in the branch which focates the VTOC entry. 


For the branch for a directory segment, some items wili tbe 
added. The maximum records used for both directory and 
non-directory records, and the togical volume iicentifier for 
nonedirectory records will be adced to the directory brancr in 
order fo make the activation of a directory which has a auota 
Simple. (Usage, time-page-product, etce will go in the 
corresponding VITOC entry.) 


One advantage of the division of information between the 
VTGC and the branch is that directory branches need not be 
modified when a segment is activated and need not even be 
referencead when a segment is deactivated. In orcer to prevent 
any directory page from being emocified at segment-activation 
time, the directory tock wil! also be moved to the AST entry for 
the directory (since a new rute will be that a directory carnoft 
be deactivated while it is locked). This change should recuce 
the pacing traffic on the system, and will reduce the chances of 
a cirectory page being camagec due to memory parity or cisk 
channel! errors, since the page need not be written back to disk 
after use. 


One problem with this division of data is that the tength 
information for a segment is kept in the VTOCs anc so the 
operation of tisting a directory requires fhe fetching of each 
VTOC entry corresponding to an entry in the directory. As 
compareq to the current sforage system, the new system will have 
to co noticeably more 1/0 to return the same  iinformation.e 
Furthermore, the real-time delays associated with functions which 
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tist a directory will increase significantiy. It may be 
necessary to store some of the fength information in dupticate in 
both fhe branch and the VTOC entry in order to alton the simplest 
cases of directory listing to operate without referencing the 
VTOC. <Another alternative would be to change the list command so 
tnat fhe agefauif case Goes nst orovide any infermation whih it Is 
costiy to obtain, and to provide new supervisor interfaces to 
replace hcs_$star, which return oniy Information kept in the 
directory. 


Directory Header 


Very litt!e change will be made to the directory header. As 
mentioned above, the directory tock will be moved from the 
directory neader fo the AST entry in order to avold unnecessary 
modification of directory pages. The peredirectory static 
muitifevel meters will be removed because nobody uses them. 


The quota information now in the directory header will be 
moved to the branch for the directory in the directory'"s parent, 
or to fhe YTOC entry corresponding to the branch. 


fach directory will gain one new item: the name and unique 
I0 of the logical voiume where segments inferior to the directory 
wil! be stored. This datum is aiso kept in the branch for the 
directory, because it is used by the quota mechanism. This 
attribute may be changed oniy for empty directories. Modify 
permission on the directory is necessary in order to change it, 
and it may not be chaged to an arbitrary value -- fhe user 
changing the togical volume ID must be listed on the extended ACL 
of the VDS for the voiume, if the volume is a private volume. 


YIOC 


The VTOC willl be organized as @ paralilei set oaf fixed-size 
arrays in a special region of the volume not availabfe for 
regular storage. One array will contain the VTCC information 
usec during normal operation, anc the other arrays, calied the 
VTIOC extension, wili be used to hold the special salvaging 
information. . 


VTOC Entry 


The VTCC entry for a segment will contain the following 
itewrs: 


unique ID 

date/time secqment modified 
date/time segment used 
file map 
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current iength 
recoras used 
directory switch 
quota information (2 sefs): 
records usea 
time-record product 
time trp tast undated 
*primary name of segment 
*uniaque ID pathname of parent 


The items marked with an asterisk will be stored in the vTcc 
extension for the convenience of the saivager. Al! otner items 
can be reconstructed from the AST antry contents, so that 
deactivation does not require any reference to the directory 
branche 


Eijie Map 


The file map in the VTGC entry will use only 18 bits oper 
record address instead of 36, because the device [0 car be 
eliminated. The file maps in every VTOC entry will ode maximum 
Size, rather than the current situation where variable-size file 
Maps are permitted. Oniy 256K segments will actualfy use more 
than 64 words of VTOC entry, but a11 192 words will be read In by 
the VTOC manager because it won*t Know the !ength of the VTOC 
entry. 


Votume Mao 


The Volume Map for a volume has one entry for each record on 
the voiume. The current systfem*s anatogue fo the volume map is a 
nired-cown data base, the bit map portion of the FSOCT, which has 
ome bit for each record. As the amount of physical storage in a 
configuration increases, this data base becomes too large to wire 
down, and so it will be attowed to reside on the votume it 
describes. 


Page Jable Nord 


Since a full aisk adaress witi no tonger fit into 22 bits 
(18-bit address pius 4-bit cevice 10) as if does in the current 
system, the format of a PTW for a page which ‘5 not in core must 
change siightiy. 


The new format of the PTW has the 18-bit volume address 
oniy; the index into the device table which comptetes the disk 
address is the same for all pages of fhe segment and so goes in 
the AST entry. 


A flag bit is turned ON in the PTW if the oage is all 
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zeroese Such @ page can be freed when the segment is 
deactivated. 


A fiag bit is turned ON in the PTW if the page has never 
been assigned. If a reference is made to such 4 page, a page 
witli be assigned at page out time. 


AST Entry 


Severai items must be added to the AST entry to support § the 
new Storage System. These incitude? 


dGevice table index 
VTOC index 
directory lock 
date/time. modified 
Girectory switch 
logical votume ID 
non-cir quota cell 
dir quota cell 


Several other items must be changed. The “dnzp” switch, if still 
necessary, changes in meaning, since zero pages are nulied at 
deactivation instead of page fault time. The “cid moves to the 
device table. The “pomi™ and the “movdid" items are obsolete. 


The units for “cst™ ana “np” should probably be i6-Kord 
biocks instead of 41024-word pages, in case we ever experirnrent 
wifh changing page sizee The “misw” fiag should be renamed the 
“in _pdir™ flag for clarity. 


Device Jable 


The device table is a new wiregd data base which repiaces 
some of the functions of the FSOCT in the current system. It has 
one entry for each aisk drive available on the system. 


In each entry, the following information is kept? 


VOLUID 

LVID 

DIM type (device Id) 

Volume state 

Disk DIM datat channel, crive, etc. 
sensitivity level anc catecory 
Volume map ftocaton 

read-only switch 

System-volume switch 

number of free records 

volume coming down Switch 
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Voivme Label 

The tlabei for a volume in the new Storage System is checked 
wher the volume is connected. It is tocated at a fixea address 
on the volume known to the DIM, and contains the addresses of the 


VTOC and the Voiume Map. It aiso contains data used to verify 
that the volume is correctiy mountec, such as 


VOLUID 

sensitivity level and catfesory 
date/time initialized . 

volume name 

manufacturer*s seriai number 
date/time iast mounted 
date/time tast saivagea 

error history, bad track list 


Disk Layout for DSU-194‘s 


The DSU-191 disks will be arranged to take advantage of the 
physicai characteristics of the disk drive. The aisk OIM for the 
191*s wiil be the only modufe which knows what strategies have 
been used In arranging the data on the disk fexcept for 80S). 


Since the first four cytinders of a 191 pack are guaranteed 
error-free, the label for the volume wii! be piaced somewhere in 
these four cylinders. The VTGC extension will also reside in 
this area. It is tempting to put the VIGC and volume map there 
too, in order to use the most retiatie cyiinders on the disks but 
probtabiy the VTOC and volume map shoujfd reside at the micddte 
cylinders of the disk, in order to minimize average seek time. 


Nolume Descriotion Seament 


Each logicai votume which can be mounted in response to a 
user request will nave a corresponding Volume Description Segment 
in a per-system directory. The exact forn of the 
volume-registration data base is currentty being redesigned, but 
whatever volume-registration data base the system finally ends up 
with, the per-volume data will incfiude 


Logicai Volume ID 

List of Physicai Volumes 

List of users who may set quotas for this volume 
Name, address, account number, etc. for billing 


